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A METHODOLOGY OF ESTIMATING THE CONFORMATION OF A PROTEIN BY PROTEOLYSIS 

The invention relates to a novel method for determining the significance of 
polymorphisms or mutations in a nucleic acid molecule encoding a protein. 

5 

Since the advent of gene sequencing technology in the late 1980's and the 
establishment of the human genome project in 1990 an enormous amount of 
information has been discovered about the sequence, or nature, of each gene in 
the human genome. Moreover, as the human genome project has developed 

1 0 the methods used to sequence genes have evolved considerably and this has 
led to the detection of variations within genes. Given that a typical gene could 
be 30 kilobases in length and that variations occur on average every 1100 
bases, it follows that a tremendous amount of work needs to be undertaken in 
order to determine which variants are of clinical or technological significance. 

1 5 However, this is a prerequisite step if one is to exploit the knowledge available in 
the human genome project and so be in a position to understand, for example, 
the human condition, and particularly human diseases, and factors that may 
influence same and so lead to new therapies. 

20 Typically, investigators in the field of human genetics who have obtained the 
sequence of the normal, or wild-type gene, set about looking for significant 
changes in the gene by sequencing nucleic acid molecules from individuals who 
are thought to harbour a gene variant. Such individuals are people exhibiting 
the symptoms of a specific disease which is thought to be related to the 
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dysfunction of a particular gene. Once a gene variant has been sequenced and 
compared with the wild-type further investigations are then undertaken to 
examine the cell biology of the protein encoded by the variant gene. The results 
of these investigations are then examined in the light of the physical symptoms 
in order to deduce a correlation. 

It therefore follows that unravelling the nature of a gene variant and relating it to 
function and then clinical symptoms is a long and tedious process, especially 
when one considers that a given gene can be 3.6% polymorphic. It is therefore 
apparent that simply identifying which variant to investigate further can be a 
difficult step in itself. This is true not only for the field of human genetics but also 
in respect of studies of other animal and plant species. 

With this in mind, we have developed a novel assay for quickly and efficiently 
determining the likely significance of a gene variant. 

Our novel methodology is based upon the basic structure of proteins. 

The basic structural unit of a protein is an amino acid. An amino acid consists of 
an amino group, a carboxyl group, a hydrogen atom and a distinctive R group 
bonded to a carbon atom, conventionally known as the side chain. There are 22 
amino acids and any number and combination of them are able to join, via 
peptide bonds, to form a sequence, or chain, of amino acids known as peptides. 
Thus the sequence of bonds running the length of the peptide chain is known as 
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the backbone. Additionally, intra and inter peptide chain linkages also exist, for 
example, in the former instance the amino group of lysine can form a peptide 
bond with the gamma carboxyl group of glutamic acid; and, in the latter instance, 
bonds may also exist between side chains of amino acids as a result of the 
formation of disulphide bonds thus forming crosslinks between separate peptide 
chains. Adjacent peptide chains can therefore join to form a secondary structure 
such as dimers or trimers etc. The secondary structures can then fold, due to 
the nature of the interaction of adjacent amino acids, to form a three dimensional 
tertiary structure. This tertiary structure represents the active form of the protein 
and may comprise sites, or pockets, into which other molecules fit in order to 
activate the protein or allow the protein to respond thereto. 

Digestion, or break down, of proteins in a controlled fashion, occurs all the time 
during the process of alimentary digestion. A class of enzymes known as 
proteases perform this function. They basically attack specific bonds in order to 
cleave the protein at sites where these bonds exist. It follows that different 
proteins will have different susceptibilities to various enzymes depending upon 
their primary structure. 

Whilst all this information is known, no-one has thought to take advantage of it 
before in relation to genetics and, in particular, in relation to screening a number 
of genetic variants whose functional, or even clinical, significance is unknown. 
Accordingly, no-one has thought to use this information as a basis to tackle the 
large number of genetic variants that exist in order to determine which are the 
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clinically, or technologically, significant variants. 

However, we have used this information to develop a novel assay which can 
screen any number of variants, simultaneously if required, in order to determine 
which, if any, require further investigation. 

Our methodology is quick, efficient and inexpensive to perform. 
Statements of Invention 

According to the invention there is therefore provided a method for determining 
the significance of a given nucleic acid polymorphism or mutation, in a nucleic 
acid molecule, on the structural properties of a protein encoded by said nucleic 
acid molecule comprising: 

(a) exposing the protein encoded by said nucleic acid molecule to at least 
one protease; and 

(b) determining whether, or to what extent, proteolytic cleavage takes place; 
and, optionally, 

(c) comparing this proteolytic cleavage with that of the wild-type protein when 
exposed to the same protease(s). 

According to a further aspect of the invention there is provided a screening 
method for determining the significance of a plurality of variants of at least one 
gene comprising: 

(a) obtaining a sample of protein encoded by each of said variants; 
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(b) exposing each protein to at least one protease; 

(c) determining whether, or to what extent, proteolytic cleavage takes place; 
and 

(d) comparing this proteolytic cleavage with that of the wild-type protein when 
exposed to the same protease(s). 

In a preferred embodiment of the invention when the above screening method is 
employed, the plurality of protein variants are exposed to a plurality of proteases 
and the corresponding proteolytic cleavage is determined. Most ideally, the 
screening methodology involves examining the plurality of variants relating to 
different genes. Thus, in a single batch, the plurality of variants corresponding 
to multiple genes are examined in respect of at least one protease, and ideally in 
respect of a plurality of proteases and a determination of proteolytic cleavage is 
made in respect of the digestion of each variant by each protease. 

It will be apparent to those skilled in the art that, using this methodology, where 
a plurality of proteases are employed a cleavage or digestion profile will be 
provided in respect of each variant and this parameters ideally can be compared 
with the digestion profile of the wild-type protein and so used to determine the 
functional significance of a variant of any one or more of the said genes. 

In a preferred embodiment of the invention said protein encoded by said nucleic 
acid molecule or gene variant is exposed to a plurality of proteases and ideally 
different proteases which attack different bonds. Proteases that are suitable for 
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use in the methodology of the invention include: Trypsin, chymotrypsin, 
proteinase K, aminopeptidase, carboxypeptidase, collagenase, elastase, 
Kallikrein, metalloendopeptidase, papain, pepsin, and indeed any other known 
protease. 

Notably, where cleavage is different from that exhibited by the wild-type, one 
would conclude that the variant, or indeed a combination of variants, was 
significant. This is because the variant(s) would either render the protein more 
vulnerable to digestion or confer resistance to digestion as a result of 
alteration(s) to the tertiary, or structural, form of the protein. 

In yet a further preferred embodiment of the invention a plurality of proteins 
encoded by a plurality of genetic variants are tested in parallel and thus the 
methodology of the invention may be performed as a screening methodology 
where a plurality of incubation receptacles are filled with a corresponding 
plurality of proteins to be tested and then said proteins are exposed to a 
selected protease, or group of proteases, or vice versa, either simultaneously or 
successively. 

More preferably still, the methodology of the invention involves incubating the 
protein(s) to be tested with the said protease(s) under conditions that support 
the activity of the relevant enzyme(s). For example, this may involve exposing 
the test protein to the enzyme at a temperature at which the enzyme is optimally 
functional, such as 37°C, and for a time sufficient for the enzyme to perform its 
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activity, for example between 15 minutes and 1.5 hours. 

More preferably still, after a suitable length of time the incubation period is 
terminated, for example, by adding an enzyme inhibitor to the incubation 
receptacle. Finally, proteolytic cleavage is assessed using any conventional 
protein assay technique such as, for example, SDS-PAGE analysis either 
followed by staining the gel (coomassie blue or silver staining) or by western 
blotting. Optionally, additional studies may then be undertaken to determine the 
functionality of the protein variant. 

In yet a further preferred embodiment of the invention the technique undertaken, 
in order to determine the extent of proteolytic cleavage, involves assaying not 
only each test protein but also the wild-type protein that, ideally, has been 
exposed to the relevant enzyme(s) and, ideally also, a sample of the wild-type 
and test protein(s) that has not been exposed to the relevant enzyme(s). In this 
way, a positive and a negative control are included in the assay for the purpose 
of determining the amount of proteolytic cleavage that the test protein exhibits 
vis a vis the wild-type protein and also the background level of protein 
degradation experienced as a result of the assay conditions. 

It is to be understood that the invention is not to be limited to the specific assay 
that is chosen to assess proteolytic cleavage, rather the invention, principally, 
lies in the use of the technique of proteolytic cleavage to assay the likely 
functional significance of genetic variants. 
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It follows from the information above regarding the tertiary structure of the 
protein that the nature of the amino acids in the peptide chain will determine the 
protein folding and so susceptibility to different enzymes. In turn, the nature of 
the amino acids in the peptide chain will be determined by the nucleic acid 
coding sequence and so variations in this sequence will lead to variations at the 
amino acid level and so differential protein folding and thus variable 
susceptibility to proteolytic cleavage. 

Given that the assay is quick and efficient to perform, a whole range of proteins, 
each coded by a genetic variant for a given gene, or more than one gene, can 
be simultaneously assayed in order to determine which variant gives rise to a 
change in the tertiary structure of the amino acid and thus which is most likely to 
affect the functioning of the protein. 

An embodiment of the invention will now be described by way of example only, 
with reference to variants in the growth hormone gene (GH1) and the following 
examples. 

In Figure 1 there is shown the digestion profile of a number of variants of the 
growth hormone gene when exposed to trypsin, chymotrypsin or proteinase K. 
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Experimental Subjects 

The experimental subjects in which the mutations for the proteolysis study were 
identified are those described in the original Human Mutation paper, Millar et al. 

5 Two different patient groups were studied. The first comprised 41 unrelated 
children of Caucasian origin with short stature (age 1-15 years) who matched the 
specific selection criteria applied in Cardiff (the "Cardiff criteria") outlined below. 
Details were taken of family history, clinical and auxological variables, and 
previously performed laboratory investigations (Table 1). Standard deviation 

10 scores (SDS) were calculated for birth weight, height prior to GH treatment, body 
mass index prior to GH treatment, height velocity immediately prior to GH 
secretion testing, paternal and maternal heights, and the target for adult height 
derived from parental measurements (Table 1 ). The degree of bone age delay 
and the results of GH secretion tests were also noted. Blood samples for DNA 

15 analysis were taken from the index case and appropriate close relatives. 

The second group comprised 11 unrelated patients with short stature and 
idiopathic isolated growth hormone deficiency (IGHD) in whom GH1 gene 
deletions had been excluded by Southern blotting. Eight of these individuals 
20 came from familities with two or more first-degree relatives with IGHD (familial 
IGHD) whilst 3 individuals represented sporadic cases of IGHD. In only one of the 
familial IGHD cases (family 37) did short stature appear to segregate as an 
autosomal dominant trait. Blood samples for DNA analysis were taken from 
available relatives in each family. 
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Control DNA samples were obtained from lymphocytes taken from 154 male 
British army recruits of Caucasian origin who were unselected for height. Height 
data were available for 124 of these individuals (mean: 1.76 + 0.07 m) and the 
height distribution was found to be normal (Shapiro-Wilk statistic W=0.984, 
p=0.16). Ethical approval for these studies was obtained from the Multi-Regional 
Ethics Committee (MREC). 

Patient Selection Criteria 

The key criterion for inclusion in this study was that the clinician assessing the 
child should have had sufficient concern with regard to the child's growth pattern 
to warrant GH secretion testing. The children selected exhibited a clinical 
phenotype that adhered to the following criteria, henceforth termed "Cardiff 
criteria": 

(a) sufficient clinical concern to have warranted GH secretion testing, 
regardless of the type of test, the test results, or indeed whether the child 
attended for testing; 

(b) no recognisable pathology likely to account for the observed growth failure; 

(c) short stature: defined as a predicted height trajectory below the lower limit 
of an individual's estimated target adult height, based upon the heights of 
that individual's parents (Tanner and Whitehouse 1976); 

(d) height velocity on or below the 25 th percentile for age (uncorrected for bone 
age); and 
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(e) evidence of bone age delay in those pre-pubertal when compared with 
chronological age by reference to the Tanner-Whitehouse scale (TW2 
method; Tanner et al, 1983). This delay should be of at least two years 
except in children of < 5 years of age. 

Materials & Methods 

Polymerase chain reaction (PCR) amplification of a GH1 -specific fragment 
Genomic DNA was extracted from patient lymphocytes by standard procedures. 
PCR amplification of a 3.2 kb GH7-specific fragment was performed as described 
(2). 

Cloning and sequencing of GH1 aene-soecific PCR fragments 
GH1 gene-specific (3.2 kb) PCR fragments were sequenced directly with BigDye 
v3.0 (Applied Biosystems, Foster City, CA) and analysed on an ABI 3100 DNA 
sequencer (Applied Biosystems) as described (2). Additional primers used for 
sequencing in the reverse direction were GHBFR (5' TGGGTGCCCTCTGGCC 3'; - 
262 to -278), GHSEQ1R (5' AGATTGGCCAAATACTGG 3'; +215 to +198), 
GHSEQ2R (5' GGAATAGACTCTGAGAAAC 3'; +785 to +767), GHSEQ3R (5' 
TCCCTTTCTCATTCATTC 3'; +1281 to +1264), GHSEQ4R (5' 
CCCGAATAGACCCCGC 3*; +1745 to +1730) [Numbering relative to the 
transcriptional initiation site at +1; GenBank Accession No. J03071]. Samples 
containing sequence variants were cloned into pGEM-T (Promega, Madison Wl) 
followed by sequencing of a minimum of four clones per individual. 
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In vitro expression a nd assay of biological activity of GH variants 
A cloned wild-type GH1 cDNA incorporating a His tag on the carboxy terminal was 
modified using site-directed mutagenesis as previously described (3) to generate 
the GH variants. 

This vector was then transfected into High Five insect cells (Invitrogen) as 
previously described (3), and human GH in the culture supernatants quantified by 
ELISA (DRG Diagnostics, Marburg, Germany). The cross-reactivity in the ELISA 
of the GH variants and insect cell-expressed wild-type GH was confirmed by 
dilutional analysis to be equal to that of the assay reference preparation 
(calibrated against the MRC 1 st IRP 80/505 reference preparation). 

Proteolytic digestion of the GH variants 

Trypsin, chymotrypsin, or proteinase K (all Sigma, Poole, UK) were added to a 
final concentration of 0.1ng/ml to 100^1 culture medium harvested from insect 
cells expressing either wild-type GH or a variant (60nM) and then incubated at 
37°C for 1 hr. Previous dose-dependent studies on wild-type GH had shown that 
0.1ng/ml was the lowest concentration at which GH degradation was detectable 
by all three enzymes. After the 1 hr treatment period, 10^1 trypsin-chymotrypsin 
inhibitor (500ug/ml) was added to stop the trypsin and chymotrypsin digests and 
1^1 PMSF (0.1M) was added to stop the proteinase K digest. Each reaction was 
then incubated for a further 15 mins at 37°C. Samples were analysed by SDS- 
PAGE on a 12% gel using a mini gel apparatus (Bio-Rad Laboratories, Hercules, 
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CA). Equivalent amounts of undigested wild-type GH and variant that had been 
incubated for 1 hour at 37°C were also run on the gel. The gel was electroblotted 
onto PVDF membrane as previously described (6), probed with a mouse 
monoclonal anti-human GH antibody (Lab Vision, Fremont, CA), diluted 1:500, 
5 detected using an anti-mouse IgG-horse radish peroxidaseHRP (HRP) conjugate 
(1:5000, Amersham Biosciences) and visualised by enhanced 
chemiluminescence (ECL Plus, Amersham Biosciences). Films were analysed 
using the Alpha Imager 1200 digital imaging system (Alpha Innotech Corp, San 
Leandro, CA) and the results expressed as the amount of GH remaining following 
10 enzyme digestion as a percentage of undigested GH. The experiments were 
repeated 3 times and assessed statistically by a two-tailed t-test. 

Molecular modelling 

The variants were structurally analysed by inspection of the appropriate variant 
15 amino acid residue in the X-ray crystallographic structure of human GH (PDB: 
3HHR) [8]. The wild-type and mutant GH structures were compared with respect 
to electrostatic interactions, hydrogen bonding, hydrophobic interactions and 
surface exposure. Molecular graphics were performed using the ICM molecular 
modelling software suite (Molsoft LLC, San Diego, CA). 

20 

Results 

Proteolysis Studies 

Figure 1 shows the results of enzyme analysis performed on a number of GH 
variants in order to determine which, if any, of these variants alter the structural 
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properties of the protein and so are likely to interfere with the activity thereof. 
Twelve variants were examined and it can be seen that with respect to the wild- 
type (WT), left hand side of the Figure, the majority of these variants have an 
effect on the susceptibility of the protein to proteolytic digestion. The variants 

5 Thr27lle and Gln91 Leu were particularly vulnerable to proteolysis and, in each 
case, proteolysis proceeded most efficiently using the enzyme chymotrypsin. 
With reference to Figure 2 it can be seen that the variant Thr27lle is predicted to 
affect internal packing around its helix 1 and the loop between helix 2 and helix 
3. This obviously has important structural implications which is reflected in the 

10 data shown in Figure 1. Similarly, in the case of Gln91Leu the substitution 
increases hydrophobicity and may affect solubility and folding. This has 
implications for the structure of the protein and thus its susceptibility to 
proteolysis. 

15 In contrast, Arg16Cys and Lys41Arg, whilst showing different proteolysis 
profiles, compared to the wild-type, are less affected than the previous variants. 
However, for Arg16Cys the predicted structural changes concern inter-molecular 
bridging rather than adverse effects on the shape of the protein. This could 
explain why the proteolysis profile is less affected. In the case of Lys41 Arg, the 

20 variant is thought to conserve ionic interactions but may lead to steric hindrance. 
Again, the implication here is that the shape of the protein is likely to be 
conserved. 

In contrast, other variants showed only marginal susceptibility to proteolysis 
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such as VAL1 lOlle and Thr175Ala which were both most resistant to proteolysis 
by the enzyme chymotrypsin. 

The results of this study show that GH variants can be characterised in terms of 
5 their proteolysis signature in response to selected proteases and this information 
represents a first step towards selecting clinically and technologically important 
variants for further analysis. 
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Table 1 

Children with short stature adhering to the specific ("Cardiff') selection criteria; 



clinical and auxological variables, and laboratory investigations. 
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Key. P: patient number; S: sex; BW: birth weight standard deviation score (SDS); 
GH: GH secretion test result (mlU/L); N: "normal", T: GH secretion test type: I: 
insulin tolerance test, C: clonidine, G: glucagon, E: exercise, R: random, X: test 
declined. H: height SDS; HA: age at height SDS and bone age assessment; BA: 
bone age in years; BMI: body mass index SDS; HV: height velocity SDS; Mat: 
maternal height SDS; Pat: paternal height SDS. 

Data from patients possessing GH1 gene lesions are shown in bold type. 
*one of two siblings with a similar phenotype. 
§ family history of GHD. 
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